Take-home Exercise 1: Geospatial Analytics for Public Good

Published

December 2, 2023

Modified

December 3, 2023

1. Overview

The digitization of urban infrastructures, such as buses and public utilities, generates datasets that, when analyzed through technologies like GPS, reveal valuable patterns in human movement. This information enhances urban management and aids both private and public transport providers in making informed decisions for a competitive edge in the evolving urban landscape. Hence there is value in performing analysis on such information, which is further showcased in this exercise.

1.1 Objectives

In this exercise, the focus will be to identify, analyse and compare the spatial patterns, through using of Geovisualisation and Analysis, and Local Indicators of Spatial Association (LISA) to uncover the spatial and spatio-temporal mobility patterns of public bus passengers in Singapore.

The specific tasks of this exercise are as follows:

Geovisualisation and Analysis

  • With reference to the time intervals provided in the table below, compute the passenger trips generated by origin at the hexagon level,

    Peak hour period Bus tap on time
    Weekday morning peak 6am to 9am
    Weekday afternoon peak 5pm to 8pm
    Weekend/holiday morning peak 11am to 2pm
    Weekend/holiday evening peak 4pm to 7pm
  • Display the geographical distribution of the passenger trips by using appropriate geovisualisation methods,

  • Describe the spatial patterns revealed by the geovisualisation (not more than 200 words per visual).

Local Indicators of Spatial Association (LISA) Analysis

  • Compute LISA of the passengers trips generate by origin at hexagon level.

  • Display the LISA maps of the passengers trips generate by origin at hexagon level. The maps should only display the significant (i.e. p-value < 0.05)

  • With reference to the analysis results, draw statistical conclusions (not more than 200 words per visual).

2. Installing and Loading the R Packages

For this exercise, in the following code chunk, p_load() from pacman package is used to install and load the following R packages into the R environment:

  • sf for importing, managing, and processing geospatial data,

  • sfdep to compute spatial weights using appropriate functions

  • tmap for creating thematic maps,

  • tidyverse for performing data science tasks such as importing, wrangling and visualising data,

  • knitr for creating html table.

pacman::p_load(sf, sfdep, tmap, tidyverse, knitr)

3. Data Preparation

3.1 Data

There are 2 datasets used, as outlined in sections 3.1 and 3.2

3.1.1 Apastial data

For the purpose of this take-home exercise, Passenger Volume by Origin Destination Bus Stops downloaded from LTA DataMall will be used. Relevant data for the month of August, September and Oct 2023 had been downloaded. For this exercise, we will observe this dataset for the time period of Oct 2023. This can reproduced later on by point to the csv file for the required month.

3.1.2 Geospatial data

Two geospatial data will be used in this study, they are:

  • Bus Stop Location from LTA DataMall. It provides information about all the bus stops currently being serviced by buses, including the bus stop code (identifier) and location coordinates.

  • hexagon, a hexagon layer of 250m (this distance is the perpendicular distance between the centre of the hexagon and its edges or in short “Apothem distance” ) should be created using Bus Stop Location and Passenger Volume by Origin Destination Bus Stops data.

3.2 Getting the Data Into R Environment

3.2.1 Importing the aspatial data

We will import the Passenger Volume By Origin Destination Bus Stops data set for Oct 2023 downloaded from LTA Datamall by using read_csv().

odbus <- read_csv("data/aspatial/origin_destination_bus_202310.csv")

Using glimpse(odbus), we can see that there are 5,694,297 rows of datapoints in this dataset.

A quick check of odbus tibble data frame also shows that the values in ORIGIN_PT_CODE and DESTINATON_PT_CODE are already in <char> type. Hence, no adjustments needed to the data type.

glimpse(odbus)
Rows: 5,694,297
Columns: 7
$ YEAR_MONTH          <chr> "2023-10", "2023-10", "2023-10", "2023-10", "2023-…
$ DAY_TYPE            <chr> "WEEKENDS/HOLIDAY", "WEEKDAY", "WEEKENDS/HOLIDAY",…
$ TIME_PER_HOUR       <dbl> 16, 16, 14, 14, 17, 17, 17, 7, 14, 14, 10, 20, 20,…
$ PT_TYPE             <chr> "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "…
$ ORIGIN_PT_CODE      <chr> "04168", "04168", "80119", "80119", "44069", "2028…
$ DESTINATION_PT_CODE <chr> "10051", "10051", "90079", "90079", "17229", "2014…
$ TOTAL_TRIPS         <dbl> 3, 5, 3, 5, 4, 1, 24, 2, 1, 7, 3, 2, 5, 1, 1, 1, 1…

3.2.2 Extracting the relevant study data from the aspatial data

For the purpose of this exercise, we will extract commuting flows based on the intervals denoted in section 1.1

However, I realized that the weekday afternoon peak stats later than weekend/holiday evening peak.

For purposes of consistency and comparison, I will rename peak hour period “Weekday afternoon peak” to “Weekday evening peak”

Peak hour period Bus tap on time Corresponding Table Name
Weekday morning peak 6am to 9am origin6_9
Weekday afternoon Weekday evening peak 5pm to 8pm origin17_20
Weekend/holiday morning peak 11am to 2pm origin11_14
Weekend/holiday evening peak 4pm to 7pm origin16_19

The following data wrangling and transformation functions will be used:

origin6_9 <- odbus %>%
  filter (DAY_TYPE == "WEEKDAY") %>%
  filter (TIME_PER_HOUR >= 6 & TIME_PER_HOUR <=9) %>%
  group_by(ORIGIN_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))

origin17_20 <- odbus %>%
  filter (DAY_TYPE == "WEEKDAY") %>%
  filter (TIME_PER_HOUR >= 17 & TIME_PER_HOUR <=20) %>%
  group_by(ORIGIN_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))

origin11_14 <- odbus %>%
  filter (DAY_TYPE == "WEEKENDS/HOLIDAY") %>%
  filter (TIME_PER_HOUR >= 11 & TIME_PER_HOUR <=14) %>%
  group_by(ORIGIN_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))

origin16_19 <- odbus %>%
  filter (DAY_TYPE == "WEEKENDS/HOLIDAY") %>%
  filter (TIME_PER_HOUR >= 16 & TIME_PER_HOUR <=19) %>%
  group_by(ORIGIN_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))

We will save the output in rds format for future use.

write_rds(origin6_9, "data/rds/origin6_9.rds")
write_rds(origin17_20, "data/rds/origin17_20.rds")
write_rds(origin11_14, "data/rds/origin11_14.rds")
write_rds(origin16_19, "data/rds/origin16_19.rds")

A check on origin6_9 data table indicates the it is correctly mapped.

origin6_9 <- read_rds("data/rds/origin6_9.rds")
origin17_20 <- read_rds("data/rds/origin17_20.rds")
origin11_14 <- read_rds("data/rds/origin11_14.rds")
origin16_19 <- read_rds("data/rds/origin16_19.rds")
kable(head(origin6_9))
ORIGIN_PT_CODE TRIPS
01012 1770
01013 841
01019 1530
01029 2483
01039 2919
01059 1734
kable(head(origin17_20))
ORIGIN_PT_CODE TRIPS
01012 8000
01013 7038
01019 3398
01029 9089
01039 12095
01059 2212
kable(head(origin11_14))
ORIGIN_PT_CODE TRIPS
01012 2177
01013 1818
01019 1536
01029 3217
01039 5408
01059 1159
kable(head(origin16_19))
ORIGIN_PT_CODE TRIPS
01012 3061
01013 2770
01019 1685
01029 4063
01039 7263
01059 1118

3.2.3 Exploratory data analysis on data

3.2.3.1 EDA on weekday morning peak data

Exploratory Data Analysis indicates following observations:

#Total trips during weekday morning peak for the month
sum(origin6_9$TRIPS)
[1] 25976410
#Maximum trips during weekday morning peak for the month from 1 origin
max(origin6_9$TRIPS)
[1] 357043

Checks in data notes that the highest no. of bus trip originates from Boon Lay Interchange (Loc_DESC)

#Minimum trips during weekday morning peak for the month from 1 origin
min (origin6_9$TRIPS)
[1] 1
#Median no. of trips during weekday morning peak from 1 origin
median(origin6_9$TRIPS)
[1] 2198
#Distribution check
hist(origin6_9$`TRIPS`)

Distribution is skewed heavily to the left.

3.2.3.2 EDA on weekday evening peak data

Exploratory Data Analysis indicates following observations:

#Total trips during weekday evening peak for the month
sum(origin17_20$TRIPS)
[1] 25075097
#Maximum trips during weekday evening peak for the month from 1 origin
max(origin17_20$TRIPS)
[1] 529081

Checks in data notes that the highest no. of bus trip originates from Boon Lay Interchange (Loc_DESC)

#Minimum trips during weekday evening peak for the month from 1 origin
min (origin17_20$TRIPS)
[1] 1
#Median no. of trips during weekday evening peak from 1 origin
median(origin17_20$TRIPS)
[1] 1972
hist(origin17_20$`TRIPS`)

Distribution is skewed heavily to the left.

3.2.3.3 EDA on Weekend/holiday morning peak data

Exploratory Data Analysis indicates following observations:

#Total trips during Weekend/holiday morning peak for the month
sum(origin11_14$TRIPS)
[1] 7737391
#Maximum trips during Weekend/holiday morning peak for the month from 1 origin
max(origin11_14$TRIPS)
[1] 103358

Checks in data notes that the highest no. of bus trip originates from Boon Lay Interchange (Loc_DESC)

#Minimum trips during Weekend/holiday morning peak for the month from 1 origin
min (origin11_14$TRIPS)
[1] 1
#Median no. of trips during Weekend/holiday morning peak from 1 origin
median(origin11_14$TRIPS)
[1] 666
hist(origin11_14$`TRIPS`)

Distribution is skewed heavily to the left.

3.2.3.4 EDA on Weekend/holiday evening peak data

Exploratory Data Analysis indicates following observations:

#Total trips during Weekend/holiday evening peak for the month
sum(origin16_19$TRIPS)
[1] 7769394
#Maximum trips during Weekend/holiday evening peak for the month from 1 origin
max(origin16_19$TRIPS)
[1] 144903

Checks in data notes that the highest no. of bus trip originates from Boon Lay Interchange (Loc_DESC)

#Minimum trips during Weekend/holiday evening peak for the month from 1 origin
min (origin16_19$TRIPS)
[1] 1
#Median no. of trips during Weekend/holiday evening peak from 1 origin
median(origin16_19$TRIPS)
[1] 629
hist(origin16_19$`TRIPS`)

Distribution is skewed heavily to the left.

3.2.3.5 EDA comparison across the different peak periods

Some observations noted for bus trips in Oct 2023:

  • All distributions are heavily skewed to the left.

  • Total bus trips for the month for both weekday peaks for both morning and afternoon are quite similar, 26,088,201 (morning) vs 25,151,947 (afternoon)

  • Total bus trips for the month for both weekend/holiday peaks for both morning and evening are quite similar, 7,766,064 (morning) vs 7,797,319 (evening)

  • Total bus trips during weekday peaks are circa 3 times more than bus trips during weekend/holiday peaks.

  • From the available data, the highest number of trips (529,081) for the month from across all peak periods occurred during weekday evening peak, and it originated from Boon Lay Interchange.

  • During all peak periods, the highest number of bus trips originates from Boon Lay Interchange. Based on public research, it is situated within Jurong Point Shopping Mall and integrated with the nearby Boon Lay MRT Station. This interchange also serves a variety of passengers, including those from Nanyang Technological University, Jurong Industrial Estate and Tuas Industrial Estate. It also has 27 bus services and it is among the largest and busiest bus interchange in Singapore with an affluence of more than 85,000 commuters every day. [1]

3.2.3 Importing the geospatial data

The code chunk below

  • uses the st_read() function of sf package to import Busstop shapefile into R as a simple feature data frame called busstop

  • Assigning EPSG code 3414 to busstop data frame.

busstop <- st_read(dsn = "data/geospatial", layer = "BusStop") %>%
  st_transform(crs = 3414)
Reading layer `BusStop' from data source 
  `D:\starweb84\ISSS624\Take-home_Ex\Take-home_Ex1\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21
st_crs(busstop)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

Before moving to the next step, we save the output into rds format as a good practice.

write_rds(busstop, "data/rds/busstop.csv")  

3.3 Geospatial data wrangling

3.3.1 Data wrangling with first set of time interval data - Weekday morning peak

In this section, we are going to append the busstop data frame onto origin6_9 data frame first.

origin_data6_9 <- left_join(origin6_9 , busstop,
            by = c("ORIGIN_PT_CODE" = "BUS_STOP_N")) %>%
  rename(ORIGIN_BS = ORIGIN_PT_CODE)

We check for duplicating records

duplicate6_9 <- origin_data6_9 %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()

We use the code chunk below to retain the unique records.

origin_data6_9 <- unique(origin_data6_9)

Using head(), we noticed that the origin_data6_9 does not appear to be in simple feature form.

head(origin_data6_9, n=5)
# A tibble: 5 × 5
  ORIGIN_BS TRIPS BUS_ROOF_N LOC_DESC                       geometry
  <chr>     <dbl> <chr>      <chr>                       <POINT [m]>
1 01012      1770 B03        HOTEL GRAND PACIFIC  (30140.8 31031.95)
2 01013       841 B05        ST JOSEPH'S CH      (30218.75 31126.49)
3 01019      1530 B04        BRAS BASAH CPLX     (30187.77 31034.55)
4 01029      2483 B07        OPP NATL LIB        (30345.83 31007.64)
5 01039      2919 B09        BUGIS CUBE          (30471.08 31175.63)

In the following section, we will create a Choropleth map with a hexagon layer. The code utilises a function called st_make_grid, whcih needs to recognize origin_data as a simple feature form with geospatial information in order to process it. Hence, in anticipation of that, we will convert the data into simple feature form using the below code chunk.

origin_data6_9<- st_as_sf (origin_data6_9)

A check indicates it now to be in simple feature form

head(origin_data6_9, n=5)
Simple feature collection with 5 features and 4 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 30140.8 ymin: 31007.64 xmax: 30471.08 ymax: 31175.63
Projected CRS: SVY21 / Singapore TM
# A tibble: 5 × 5
  ORIGIN_BS TRIPS BUS_ROOF_N LOC_DESC                       geometry
  <chr>     <dbl> <chr>      <chr>                       <POINT [m]>
1 01012      1770 B03        HOTEL GRAND PACIFIC  (30140.8 31031.95)
2 01013       841 B05        ST JOSEPH'S CH      (30218.75 31126.49)
3 01019      1530 B04        BRAS BASAH CPLX     (30187.77 31034.55)
4 01029      2483 B07        OPP NATL LIB        (30345.83 31007.64)
5 01039      2919 B09        BUGIS CUBE          (30471.08 31175.63)

We now replicate the same data wrangling for the other 3 datatables: origin17_20, origin11_14 and origin16_19.

3.3.2 Data wrangling with second set of time interval data - weekday evening peak

3.3.1 For origin17_20:

Append the busstop data frame onto origin17_20 data frame

origin_data17_20 <- left_join(origin17_20 , busstop,
            by = c("ORIGIN_PT_CODE" = "BUS_STOP_N")) %>%
  rename(ORIGIN_BS = ORIGIN_PT_CODE)

We check for duplicating records

duplicate17_20 <- origin_data17_20 %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()

We use the code chunk below to retain the unique records.

origin_data17_20 <- unique(origin_data17_20)

Using head(), we noticed that the origin_data17_20 does not appear to be in simple feature form.

head(origin_data17_20, n=5)
# A tibble: 5 × 5
  ORIGIN_BS TRIPS BUS_ROOF_N LOC_DESC                       geometry
  <chr>     <dbl> <chr>      <chr>                       <POINT [m]>
1 01012      8000 B03        HOTEL GRAND PACIFIC  (30140.8 31031.95)
2 01013      7038 B05        ST JOSEPH'S CH      (30218.75 31126.49)
3 01019      3398 B04        BRAS BASAH CPLX     (30187.77 31034.55)
4 01029      9089 B07        OPP NATL LIB        (30345.83 31007.64)
5 01039     12095 B09        BUGIS CUBE          (30471.08 31175.63)

In the following section, we will create a Choropleth map with a hexagon layer. The code utilises a function called st_make_grid, whcih needs to recognize origin_data as a simple feature form with geospatial information in order to process it. Hence, in anticipation of that, we will convert the data into simple feature form using the below code chunk.

origin_data17_20<- st_as_sf (origin_data17_20)

A check indicates it now to be in simple feature form

head(origin_data17_20, n=5)
Simple feature collection with 5 features and 4 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 30140.8 ymin: 31007.64 xmax: 30471.08 ymax: 31175.63
Projected CRS: SVY21 / Singapore TM
# A tibble: 5 × 5
  ORIGIN_BS TRIPS BUS_ROOF_N LOC_DESC                       geometry
  <chr>     <dbl> <chr>      <chr>                       <POINT [m]>
1 01012      8000 B03        HOTEL GRAND PACIFIC  (30140.8 31031.95)
2 01013      7038 B05        ST JOSEPH'S CH      (30218.75 31126.49)
3 01019      3398 B04        BRAS BASAH CPLX     (30187.77 31034.55)
4 01029      9089 B07        OPP NATL LIB        (30345.83 31007.64)
5 01039     12095 B09        BUGIS CUBE          (30471.08 31175.63)

3.3.3 Data wrangling with third set of time interval data - Weekend/holiday morning peak

3.3.1 For origin11_14:

Append the busstop data frame onto origin11_14 data frame

origin_data11_14 <- left_join(origin11_14 , busstop,
            by = c("ORIGIN_PT_CODE" = "BUS_STOP_N")) %>%
  rename(ORIGIN_BS = ORIGIN_PT_CODE)

We check for duplicating records

duplicate11_14 <- origin_data11_14 %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()

We use the code chunk below to retain the unique records.

origin_data11_14 <- unique(origin_data11_14)

Using head(), we noticed that the origin_data11_14 does not appear to be in simple feature form.

head(origin_data11_14, n=5)
# A tibble: 5 × 5
  ORIGIN_BS TRIPS BUS_ROOF_N LOC_DESC                       geometry
  <chr>     <dbl> <chr>      <chr>                       <POINT [m]>
1 01012      2177 B03        HOTEL GRAND PACIFIC  (30140.8 31031.95)
2 01013      1818 B05        ST JOSEPH'S CH      (30218.75 31126.49)
3 01019      1536 B04        BRAS BASAH CPLX     (30187.77 31034.55)
4 01029      3217 B07        OPP NATL LIB        (30345.83 31007.64)
5 01039      5408 B09        BUGIS CUBE          (30471.08 31175.63)

In the following section, we will create a Choropleth map with a hexagon layer. The code utilises a function called st_make_grid, whcih needs to recognize origin_data as a simple feature form with geospatial information in order to process it. Hence, in anticipation of that, we will convert the data into simple feature form using the below code chunk.

origin_data11_14<- st_as_sf (origin_data11_14)

A check indicates it now to be in simple feature form

head(origin_data11_14, n=5)
Simple feature collection with 5 features and 4 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 30140.8 ymin: 31007.64 xmax: 30471.08 ymax: 31175.63
Projected CRS: SVY21 / Singapore TM
# A tibble: 5 × 5
  ORIGIN_BS TRIPS BUS_ROOF_N LOC_DESC                       geometry
  <chr>     <dbl> <chr>      <chr>                       <POINT [m]>
1 01012      2177 B03        HOTEL GRAND PACIFIC  (30140.8 31031.95)
2 01013      1818 B05        ST JOSEPH'S CH      (30218.75 31126.49)
3 01019      1536 B04        BRAS BASAH CPLX     (30187.77 31034.55)
4 01029      3217 B07        OPP NATL LIB        (30345.83 31007.64)
5 01039      5408 B09        BUGIS CUBE          (30471.08 31175.63)

3.3.4 Data wrangling with fourth set of time interval data - Weekend/holiday evening peak

3.3.1 For origin16_19:

Append the busstop data frame onto origin16_19 data frame

origin_data16_19 <- left_join(origin16_19 , busstop,
            by = c("ORIGIN_PT_CODE" = "BUS_STOP_N")) %>%
  rename(ORIGIN_BS = ORIGIN_PT_CODE)

We check for duplicating records

duplicate16_19 <- origin_data16_19 %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()

We use the code chunk below to retain the unique records.

origin_data16_19 <- unique(origin_data16_19)

Using head(), we noticed that the origin_data16_19 does not appear to be in simple feature form.

head(origin_data16_19, n=5)
# A tibble: 5 × 5
  ORIGIN_BS TRIPS BUS_ROOF_N LOC_DESC                       geometry
  <chr>     <dbl> <chr>      <chr>                       <POINT [m]>
1 01012      3061 B03        HOTEL GRAND PACIFIC  (30140.8 31031.95)
2 01013      2770 B05        ST JOSEPH'S CH      (30218.75 31126.49)
3 01019      1685 B04        BRAS BASAH CPLX     (30187.77 31034.55)
4 01029      4063 B07        OPP NATL LIB        (30345.83 31007.64)
5 01039      7263 B09        BUGIS CUBE          (30471.08 31175.63)

In the following section, we will create a Choropleth map with a hexagon layer. The code utilises a function called st_make_grid, whcih needs to recognize origin_data as a simple feature form with geospatial information in order to process it. Hence, in anticipation of that, we will convert the data into simple feature form using the below code chunk.

origin_data16_19<- st_as_sf (origin_data16_19)

A check indicates it now to be in simple feature form

head(origin_data16_19, n=5)
Simple feature collection with 5 features and 4 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 30140.8 ymin: 31007.64 xmax: 30471.08 ymax: 31175.63
Projected CRS: SVY21 / Singapore TM
# A tibble: 5 × 5
  ORIGIN_BS TRIPS BUS_ROOF_N LOC_DESC                       geometry
  <chr>     <dbl> <chr>      <chr>                       <POINT [m]>
1 01012      3061 B03        HOTEL GRAND PACIFIC  (30140.8 31031.95)
2 01013      2770 B05        ST JOSEPH'S CH      (30218.75 31126.49)
3 01019      1685 B04        BRAS BASAH CPLX     (30187.77 31034.55)
4 01029      4063 B07        OPP NATL LIB        (30345.83 31007.64)
5 01039      7263 B09        BUGIS CUBE          (30471.08 31175.63)

4. Choropleth Visualisation

4.1 Making the Grid

In this section, we will be creating hexagon grids to help in subsequent analysis.
The below code chunk shows the hexagon grid making for the first time interval using the data table origin_data6_9.

4.1.1 Making the Grid for Weekday morning peak data

# Apothem distance is distance from the centre of a regular polygon at right angles to any of its sides. As stated by the data requirement, we set the desired apothem distance at 250
apothem_distance <- 250

# Calculate the side length
side_length <- (2 * apothem_distance) / sqrt(3)

# Use the calculated side length in st_make_grid
area_honeycomb_grid6_9 <- st_make_grid(origin_data6_9, cellsize = c(side_length, side_length), what = "polygons", square = FALSE)

# To sf and add grid ID
honeycomb_grid_sf6_9 = st_sf(area_honeycomb_grid6_9) %>%
  # add grid ID
  mutate(grid_id = 1:length(lengths(area_honeycomb_grid6_9)))

# Use st_intersection to get a new dataset with intersecting points
intersections_data6_9 <- st_intersection(honeycomb_grid_sf6_9, origin_data6_9)

# Aggregate to sum the "TRIPS" values for each grid
sum_trips6_9 <- aggregate(intersections_data6_9$TRIPS, by = list(intersections_data6_9$grid_id), sum, na.rm = TRUE)

# Merge the result back to the honeycomb grid
colnames(sum_trips6_9) <- c("grid_id", "Sum_TRIPS")  
honeycomb_grid_sf6_9 <- merge(honeycomb_grid_sf6_9, sum_trips6_9, by.x = "grid_id", by.y = "grid_id", all.x = TRUE)

# remove grid with value of 0 
honeycomb_count6_9 = filter(honeycomb_grid_sf6_9, Sum_TRIPS > 0)

This is replicated for the remaining 3 time intervals.

4.1.2 Making the Grid for weekday evening peak data

Show the code
# Apothem distance is distance from the centre of a regular polygon at right angles to any of its sides. As stated by the data requirement, we set the desired apothem distance at 250
apothem_distance <- 250

# Calculate the side length
side_length <- (2 * apothem_distance) / sqrt(3)

# Use the calculated side length in st_make_grid
area_honeycomb_grid17_20 <- st_make_grid(origin_data17_20, cellsize = c(side_length, side_length), what = "polygons", square = FALSE)

# To sf and add grid ID
honeycomb_grid_sf17_20 = st_sf(area_honeycomb_grid17_20) %>%
  # add grid ID
  mutate(grid_id = 1:length(lengths(area_honeycomb_grid17_20)))

# Use st_intersection to get a new dataset with intersecting points
intersections_data17_20 <- st_intersection(honeycomb_grid_sf17_20, origin_data17_20)

# Aggregate to sum the "TRIPS" values for each grid
sum_trips17_20 <- aggregate(intersections_data17_20$TRIPS, by = list(intersections_data17_20$grid_id), sum, na.rm = TRUE)

# Merge the result back to the honeycomb grid
colnames(sum_trips17_20) <- c("grid_id", "Sum_TRIPS")  
honeycomb_grid_sf17_20 <- merge(honeycomb_grid_sf17_20, sum_trips17_20, by.x = "grid_id", by.y = "grid_id", all.x = TRUE)

# remove grid with value of 0 
honeycomb_count17_20 = filter(honeycomb_grid_sf17_20, Sum_TRIPS > 0)

4.1.3 Making the Grid for Weekend/holiday morning peak data

Show the code
# Apothem distance is distance from the centre of a regular polygon at right angles to any of its sides. As stated by the data requirement, we set the desired apothem distance at 250
apothem_distance <- 250

# Calculate the side length
side_length <- (2 * apothem_distance) / sqrt(3)

# Use the calculated side length in st_make_grid
area_honeycomb_grid11_14 <- st_make_grid(origin_data11_14, cellsize = c(side_length, side_length), what = "polygons", square = FALSE)

# To sf and add grid ID
honeycomb_grid_sf11_14 = st_sf(area_honeycomb_grid11_14) %>%
  # add grid ID
  mutate(grid_id = 1:length(lengths(area_honeycomb_grid11_14)))

# Use st_intersection to get a new dataset with intersecting points
intersections_data11_14 <- st_intersection(honeycomb_grid_sf11_14, origin_data11_14)

# Aggregate to sum the "TRIPS" values for each grid
sum_trips11_14 <- aggregate(intersections_data11_14$TRIPS, by = list(intersections_data11_14$grid_id), sum, na.rm = TRUE)

# Merge the result back to the honeycomb grid
colnames(sum_trips11_14) <- c("grid_id", "Sum_TRIPS")  
honeycomb_grid_sf11_14 <- merge(honeycomb_grid_sf11_14, sum_trips11_14, by.x = "grid_id", by.y = "grid_id", all.x = TRUE)

# remove grid with value of 0 
honeycomb_count11_14 = filter(honeycomb_grid_sf11_14, Sum_TRIPS > 0)

4.1.4 Making the Grid for Weekend/holiday evening peak data

Show the code
# Apothem distance is distance from the centre of a regular polygon at right angles to any of its sides. As stated by the data requirement, we set the desired apothem distance at 250
apothem_distance <- 250

# Calculate the side length
side_length <- (2 * apothem_distance) / sqrt(3)

# Use the calculated side length in st_make_grid
area_honeycomb_grid16_19 <- st_make_grid(origin_data16_19, cellsize = c(side_length, side_length), what = "polygons", square = FALSE)

# To sf and add grid ID
honeycomb_grid_sf16_19 = st_sf(area_honeycomb_grid16_19) %>%
  # add grid ID
  mutate(grid_id = 1:length(lengths(area_honeycomb_grid16_19)))

# Use st_intersection to get a new dataset with intersecting points
intersections_data16_19 <- st_intersection(honeycomb_grid_sf16_19, origin_data16_19)

# Aggregate to sum the "TRIPS" values for each grid
sum_trips16_19 <- aggregate(intersections_data16_19$TRIPS, by = list(intersections_data16_19$grid_id), sum, na.rm = TRUE)

# Merge the result back to the honeycomb grid
colnames(sum_trips16_19) <- c("grid_id", "Sum_TRIPS")  
honeycomb_grid_sf16_19 <- merge(honeycomb_grid_sf16_19, sum_trips16_19, by.x = "grid_id", by.y = "grid_id", all.x = TRUE)

# remove grid with value of 0 
honeycomb_count16_19 = filter(honeycomb_grid_sf16_19, Sum_TRIPS > 0)

4.2 Geovisualisation and analysis of passenger trips during weekday morning peak

The below code chunk creates the Choropleth map using tmap and the hexagon overlay.

tmap_mode("view")

map_honeycomb6_9 = tm_shape(honeycomb_count6_9) +
  tm_fill(
    col = "Sum_TRIPS",
    palette = "Reds",
    style = "jenks",
    title = "Bus trips during weekday morning peak",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.6,
        popup.vars = c(
      "No. of bus trips: " = "Sum_TRIPS"
    ),
    popup.format = list(
      Sum_TRIPS = list(format = "f", digits = 0)
    )
  ) +
  tm_borders(col = "grey40", lwd = 0.7)

map_honeycomb6_9

Geospatial visualization reveals distinctive patterns during weekday morning peaks (6am to 9pm), with a notable surge in the origination of bus trips emanating from residential heartlands like Tampines, Boon Lay, and SengKang. In contrast, fewer bus journeys commence from the bustling city center and industrial zones, where commercial and industrial spaces predominate over residential areas. This phenomenon aligns with the typical morning routine, wherein commuters embark on buses from their homes to workplaces during weekday peak hours.

Upon closer scrutiny, specific zones exhibit exceptionally high trip counts, particularly in proximity to key transit hubs such as Boon Lay Interchange, Woodlands MRT station, the Woodlands checkpoint, and major MRT-bus interchanges. These locations witness a heightened concentration of passengers, suggesting a correlation with areas where individuals assemble to transfer buses for diverse destinations.

While preliminary observations hint at a potential correlation between higher trip numbers and residential heartland areas characterized by numerous HDBs and condominiums, definitive conclusions await the overlay of demographic data. Further analysis incorporating demographic information is essential to validate whether these patterns correlate with income levels or specific residential types, providing a more comprehensive understanding of the dynamics influencing public transit usage in these regions.

4.2 Geovisualisation and analysis of passenger trips during weekday evening peak

The below code chunk creates the Choropleth map using tmap and the hexagon overlay.

tmap_mode("view")

map_honeycomb17_20 = tm_shape(honeycomb_count17_20) +
  tm_fill(
    col = "Sum_TRIPS",
    palette = "Reds",
    style = "jenks",
    title = "Bus trips during weekday evening peak",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.6,
        popup.vars = c(
      "No. of bus trips: " = "Sum_TRIPS"
    ),
    popup.format = list(
      Sum_TRIPS = list(format = "f", digits = 0)
    )
  ) +
  tm_borders(col = "grey40", lwd = 0.7)

map_honeycomb17_20

Geospatial visualization exposes a distinctive transit pattern during the weekday evening peak (5 pm to 8 pm), emphasizing the sustained importance of major transit hubs like Boon Lay Interchange and Woodlands MRT station. In the city center, a noticeable increase in bus trips is observed compared to the morning peak, with heightened colors indicating greater activity.

Conversely, residential heartlands experience a decline in originating bus trips during the evening peak, aligning with the post-office hours when commuters return home. Noteworthy exceptions include certain transit hubs, such as Boon Lay Interchange, witnessing a substantial 50% surge compared to the morning. This prompts a vital consideration: assessing whether an increased bus departure frequency is necessary from these hubs to meet the escalated demand during the weekday evening peak. Such analysis is crucial for optimizing transportation services and addressing heightened demand during specific time periods.

4.3 Geovisualisation and analysis of passenger trips during Weekend/holiday morning peak

The below code chunk creates the Choropleth map using tmap and the hexagon overlay.

tmap_mode("view")

map_honeycomb11_14 = tm_shape(honeycomb_count11_14) +
  tm_fill(
    col = "Sum_TRIPS",
    palette = "Blues",
    style = "jenks",
    title = "Bus trips during Weekend/holiday morning peak",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.6,
        popup.vars = c(
      "No. of bus trips: " = "Sum_TRIPS"
    ),
    popup.format = list(
      Sum_TRIPS = list(format = "f", digits = 0)
    )
  ) +
  tm_borders(col = "grey40", lwd = 0.7)

map_honeycomb11_14

Geospatial visualization reveals a consistent trend during weekend and holiday morning peaks (11 am to 2 pm), with a notable increase in bus trips originating from transit hubs. However, a distinctive difference is observed in the shades of blue at residential hubs, appearing visibly lighter compared to weekday morning peaks. The overall count of bus trips per zone has significantly decreased in comparison to weekday mornings. The cause of this decline remains uncertain—it could be attributed to a general reduction in weekend travel or a more evenly distributed ridership throughout the day, given the absence of the typical workday morning rush.

The ambiguity arises as weekends often entail fewer people commuting during the morning, possibly due to non-working or half-working days. A comprehensive investigation beyond the peak period is warranted to ascertain whether this decline represents an overall reduction in bus rides across various zones. Delving further into weekend time periods would provide valuable insights into travel patterns and help determine the factors contributing to the observed changes in bus trip frequencies

4.4 Geovisualisation and analysis of passenger trips during Weekend/holiday evening peak

The below code chunk creates the Choropleth map using tmap and the hexagon overlay.

tmap_mode("view")

map_honeycomb16_19 = tm_shape(honeycomb_count16_19) +
  tm_fill(
    col = "Sum_TRIPS",
    palette = "Blues",
    style = "jenks",
    title = "Bus trips during Weekend/holiday evening peak",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.6,
        popup.vars = c(
      "No. of bus trips: " = "Sum_TRIPS"
    ),
    popup.format = list(
      Sum_TRIPS = list(format = "f", digits = 0)
    )
  ) +
  tm_borders(col = "grey40", lwd = 0.7)

map_honeycomb16_19

Geospatial visualization illustrates a consistent trend during weekend and holiday evening peaks (4 pm to 7 pm), with an increased number of trips originating from transit hubs. However, a discernible decline in overall bus trips per zone, compared to weekday evenings, raises questions about the reasons behind this reduction. It remains unclear whether this decline is due to fewer people traveling on weekends or if there’s a more evenly distributed ridership throughout the day.To comprehensively understand these dynamics, additional research beyond peak hours during weekends is warranted.

Notably, during this period, heightened activity is observed in zones near Orchard Road and Somerset Road, indicating an increased number of trips, possibly from entertainment and shopping districts. This aligns with the expectation that people, having spent their weekends or holidays in these areas, are heading home during the evening peak. Consequently, bus operators can consider adjusting frequencies to accommodate the higher demand in these entertainment and shopping districts during weekend and holiday evenings.